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ABSTRACT 

We describe a new algorithm for finding substructures within dark matter haloes from N- 
body simulations. The algorithm relies upon the fact that dynamically distinct substructures 
in a halo will have a local velocity distribution that differs significantly from the mean, i.e. 
smooth background halo. We characterize the large-scale mean field using a coarsely grained 
cell-based approach, while a kernel smoothing process is used to determined the local velocity 
distribution. Comparing the ratio of these two estimates allows us to identify particles which 
are strongly cluster in velocity space relative to the background and thus resident in substruc- 
ture. From this population of outliers, groups are identified using a Friends-of-Friends-like 
approach. False positives are rejected using Poisson noise arguments. This approach does not 
require a search of the full phase-space structure of a halo, a non-trivial task, and is thus com- 
putationally advantageous. We apply our algorithm to several test cases and show that it iden- 
tifies not only subhaloes, bound overdensities in phase-space, but can recover tidal streams 
with a high purity. Our method can even find streams which do not appear significantly over- 
dense in either physical or phase-space. 
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1 INTRODUCTION 

The dark matter in the haloes that surround galaxies such as our 
own is often modeled as a smooth distribution in both position 
and velocity space. In particular, the spatial distribution is gener- 
ally assumed to fall off monotonically from the halo center, and 
the local velocity distribution is assumed to be Maxwellian. These 
assumptions provide a useful starting point for making model pre- 
dictions for dark matter detection experiments (e.g. Lewin & Smith 
1996). However, the haloes that form in cosmological simulations 
are anything but smooth, even relaxed, virialized haloes that have 
not undergone any major mergers in recent history. In particular, 
they contain subhaloes, which range in mass from a few percent 
of the host halo's mass down to the resolution limit of the simula- 
tion (e.g. Moore et al. 1999; Gao et al. 2004; Diemand et al. 2006, 
2008; Springel et al. 2008). 

Subhaloes are but one example of substructure found in sim- 
ulated haloes. As subhaloes traverse the host they will experience 
mass loss and leave behind debris in the form of shells and streams. 
In simulations these coherent streams of particles can exist for a few 
orbital periods before eventually dissolving into the smooth halo 
background. The resulting superposition of streams and subhaloes 
produces a complex velocity structure (e.g. Fairbairn & Schwetz 
2009; Kuhlen et al. 2010), and even the "smooth" background of 
a halo is composed of many overlapping fundamental dark mat- 



ter streams (Stiff & Widrow 2003; Vogelsberger & White 2010). 
Lastly, and perhaps most practically, the number, velocity profile 
and mass of these streams will have important ramifications for 
direct and indirect dark matter searches (e.g. Stiff et al. 2001; Vo- 
gelsberger et al. 2008, 2009; Kuhlen et al. 2010). 

The interest in streams is not limited solely to dark matter. 
Galactic stellar haloes are thought to form, in part, from the accre- 
tion and subsequent disruption of satellite galaxies (Johnston et al. 
2008). A prime example is the Sagittarius stream associated with 
the Sagittarius dwarf (Ibata et al. 1994, 2001). The PANDAS sur- 
vey has found evidence of stellar streams in M3 1, which may come 
from disrupted satellites (McConnachie et al. 2009). These stel- 
lar substructures offer an observable window into galaxy forma- 
tion history and searches for such structures are just beginning in 
earnest. Data from future instruments such as GAIA (Mignard et al. 
2008) will allow us to test the ACDM model by comparing the 
number of observed stellar streams to the number predicted from 
cosmological simulations. 

The problem with finding substructures is non-trivial and a 
number of techniques have been developed to find subhaloes. These 
include SKID (Stadel 2001), SUBFIND (Springel et al. 2001), 6DFOF 
(Diemand et al. 2006), ENLINK (Sharma & Johnston 2009), and 
HSF (Maciejewski et al. 2009). Algorithms such as SUBFIND search 
for physical overdensities or clustering in configuration space and 
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are well suited to finding subhaloes. Simulations show that the con- 
trast between substructures and the background is more prominent 
in phase-space and action space (Helmi & de Zeeuw 2000; Gomez 
& Helmi 2010). Algorithms such as ENLINK and HSF make use 
of this increased contrast by using phase-space density to identify 
substructures. In general, most of these algorithms search for lo- 
cal peaks in either physical or phase-space density and estimate the 
outer boundaries enclosing these peaks. A number of studies have 
searched for completely disrupted subhaloes by identifying parti- 
cles that were once bound to a subhalo at some early time and that 
are no longer bound to this subhalo (Helmi et al. 2003; Vogels- 
berger et al. 2009; Xu et al. 2009). This process is laborious and 
computationally intensive since past snapshots must be stored and 
tracked from output to output. 

Here we present a new algorithm, the STructure Finder (STF) 
designed to find velocity substructures within dark matter haloes. 
Substructures occupy specific regions in orbital space and have 
smaller velocity dispersion than the virialized background and con- 
sequently should appear more strongly clustered in velocity space 
relative to the background. In short, we identify candidate streams 
(and subhaloes) by finding groups of particles that "stand out" 
against the velocity distribution of the background. We model the 
local velocity distribution as a multivariate Gaussian and therefore, 
our method is most sensitive to substructures on the tails of the 
Gaussian, that is, substructures that are cold but that have a large 
bulk velocity. Once we have identified these outlying particles we 
then use a Friends-of-Friends-like algorithm to link them. 

Our paper is organized as follows: In section §2, we outline 
our algorithm, determine the algorithm's optimal parameters, and 
discuss what types of substructure can be identified. We test our 
algorithm with toy models and discuss the results in §3. In section 
§4, we analyze a more realistic model of a dark matter halo. The 
paper concludes in §5 with a summary and discussion. 



2 THE STRUCTURE FINDER ALGORITHM 

The STructure Finder (STF) is designed to find velocity substruc- 
tures in N-body haloes by identifying regions of the velocity dis- 
tribution that differ from the smooth "Maxwellian" background. 
Since subhaloes are remnants of isolated haloes that formed at an 
earlier epoch, they should be physically denser and have smaller ve- 
locity dispersions than the host in which they now reside, and there- 
fore have higher phase-space densities. In principle, if we could 
accurately measure the phase-space structure of a halo, both sub- 
haloes and tidal streams would be easy to identify since Liouville's 
theorem states that their phase-space volume and density remains 
constant as the phase-space region they occupy is distorted by grav- 
itational evolution. 

Unfortunately, current N-body simulations do not have the res- 
olution required to measure phase-space structure at the accuracy 
needed to identify diffuse tidal streams 1 (Stiff et al. 2001), though 
they can easily resolve the presense of subhalos (Knebe et al. 201 1). 
Furthermore, the highly anisotropic nature of the velocity distri- 
bution of streams makes it difficult to accurately measure their 
phase-space density. Finally, from a practical perspective, the tech- 
niques used to estimate the local phase-space density are substan- 
tially more computationally intensive than those used to estimate 
the physical density (Ascasibar & Binney 2005; Sharma & Stein- 
metz 2006; Sharma & Johnston 2009). 

1 Technically, the volume is only conserved for the fine-grained distribu- 



Our scheme relies on the assumption that a halo's velocity dis- 
tribution can be split into a virialized background and substructures. 
Specifically, we assume that within a small volume of a simulated 
halo the distribution function is approximately separable so that one 
can write 



F(x,v) = p(x)/(v), 



(1) 



where p(x) and (v) are the physical and velocity density, respec- 
tively. The ratio of the distribution function of particles associated 
to a substructure with a dispersion as to that of the background 
distribution with <7b g at the same phase-space coordinates is 
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(2) 



where here we have assumed the velocity distributions of both the 
substructure and the background are Gaussian. 

This ratio has three distinct terms, the physical density con- 
trast, the contrast in the dispersions and a ratio of Gaussian terms. 
Subhaloes are physically dense and dynamically cold. Conversely, 
tidal streams may have a negligible density contrasts and may 
also have velocity dispersion comparable with the background. 
In both cases, the substructure's velocity distributions will differ 
from the background. Consequently, although the Gaussian term 
for the substructure will likely be of order unity, due to the offset 
in the velocity, <5v = v — Vb g , this ratio is enhanced an factor 
of exp((5w 2 /2(7b g ). It is this exponential factor that is key to our 
algorithm. 

Thus instead of measuring the phase-space density we mea- 
sure the local velocity density of a particle, /i(v), and divide out 
the expect velocity density of the background at the particle's ve- 
locity, /bg(v). This ratio is used to identify particles that belong 
to velocity distributions which differ from the background. This ra- 
tio is particularly effective at identifying substructures with large 
relative bulk velocities regardless of their actual physical or phase- 
space density. 

The STF algorithm is composed of two main parts: the first 
part determines the likelihood of a particle lying outside the back- 
ground velocity distribution; the second links these outliers using 
a Friends-Of-Friends algorithm. We summarize the steps of our al- 
gorithm in Fig. 1. 



2.1 Finding Outliers 

2.1.1 Background velocity distribution 

We assume the mean background velocity density is characterized 
by a Multivariate Gaussian . Thus the expected background veloc- 
ity density for a particle k with velocity v^ is 

, (v x ^H(v^v)r'(v t -v)] 

/bgiv fc j - (2tt) 3 / 2 |E| 1 /2 ' w 

where v is the average velocity, and E is matrix representation of 
the velocity dispersion tensor about v, erf ' j. Both v and of ,, depend 
of position relative to the halo's center-of-mass. 



tion function and N-body simulations sample the coarse-grained distribu- 
tion function. Furthermore, N-body codes do not explicitly solve the colli- 
sionless Boltzmann equantion and thereby explictly conserve this volume. 
2 Vogelsberger et al. (2009) found that the velocity distribution from the 
central region of a high resolution simulation of a galactic halo was approx- 
imately characterized by a Multivariate Gaussian. The observed deviations 
away from this prediction may be due to the presence of substructure and 
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where m^ is the mass in the k th bin and m to t is the total mass. 
From the perspective of information theory, this process amounts 



may depend on the details of the halo's formation history. Hence a reason- 
able first order assumption is that the background is described by a Multi- 
variate Gaussian. 



Figure 1. Flow chart of STF algorithm. 

The key to accurately characterizing the mean field is split- 
ting the halo into appropriately sized volumes or cells. These vol- 
umes should contain enough particles so that the statistical error 
within a cell is negligible, but not be so large that density and po- 
tential varies greatly across the cell. The requirement that cells not 
be too large can be understood as follows: In a perfectly spherical, 
isotropic halo, v = but a 2 ~ GM(r)/r. If the halo is decom- 
posed into spherical shells and the dispersion calculated using all 
the particles in a shell, the estimator for a 2 is effectively an aver- 
age of GM(r)/r over the radial thickness of the shell. Using a very 
thick shell will result in a biased estimator. 

To ensure that a balance is achieved between these compet- 
ing effects, we split the halo into cells using a kd-tree (Friedman 
et al. 1977; Appel 1985; Barnes & Hut 1986) so that each cell con- 
tains iV C eii = N/2 a particles, where N is the number of particles 
in the halo and a is the depth of the tree. The tree is constructed 
by iteratively splitting the system along the spatial dimension that 
maximizes Shannon entropy, S. This quantity is estimated for each 
physical dimension in a volume containing n particles by binning 
them in ribins that span the extent of the dimension using the for- 
mula 



to splitting the system along the dimension that contains the max- 
imum amount of information. In this case, the splitting dimension 
will be the one with the greatest amount of variation in the parti- 
cles' positions. This method effectively minimizes the variation in 
particle density across any given cell volume. 

Once the halo has been split into cells, we calculate the center- 
of-mass velocity and the velocity dispersion tensor for each cell, 
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(5) 
(6) 



where M ce ii is the mass contain in the cell. 

Estimating /bg(vfc) using only the v and erf j of the cell con- 
taining the particle can give rise to grid effects, that is sharp transi- 
tions in /b g at a given velocity between cells. These discontinuities 
are particularly noticeable for cells dominated by a single substruc- 
ture. To minimize these effects, we interpolate v and afj (techni- 
cally we interpolate the inverse of the velocity dispersion tensor, 
E" 1 ) at a particle's physical position with a inverse-distance in- 
terpolation scheme using the cell containing the particle and the 
six nearest neighbouring cells since the mean field should vary 
smoothly from cell to cell. Explicitly, we use 



N 



»(x)t 



i=0 E j=0 W i( X )' 



(7) 



where u is the quantity we wish to determine at a position x based 
on cells with center-of-mass positions Xj , and Wi (x) = x — x* | - : . 



2.1.2 Local velocity distribution 

We estimate the local velocity density of a particle, /i(Vfc), using 
a kernel-scheme with an Epanechnikov smoothing kernel (Sharma 
& Steinmetz 2006). This density is calculated using 7V V nearest ve- 
locity neighbours from the set of N ss nearest physical neighbours, 
where N v ^ N ae . So as to not introduce any grid effects we do 
not limit the search for a particle's nearby physical neighbours to 
particles that are in the same cell. Using a small number of velocity 
neighbours from a larger set of physical neighbours can give a bi- 
ased estimate of the local velocity density. However, since our goal 
is to highlight clustering in velocity space, this is perfectly accept- 
able. We measure the local velocity density using a small number 
of nearest velocity neighbours, ie: N v ~ 10—100. The set of phys- 
ical neighbours from which the local velocity density is estimated 
should be substantially larger than the number used to calculate the 
velocity density. We generally fix N sc = 32iV v . 

In summary, this part of the algorithm has two key parameters, 
TYceii and N v , which effectively define the volumes used to measure 
the mean and local velocity densities. 



2.1.3 Comparing the distributions 

Next, we consider the logarithmic ratio of the local and background 
velocity distributions, 



fc*=ln[/i(vjk)//i*(v h )]. 



(8) 



Before we address what 7£-value indicates that a particle belongs to 
a substructure, we need to consider the scatter that may be present 
in a smooth halo without any substructure. 
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Figure 2. The normalized distribution of 1Z for a smooth NFW halo. In the 
top panel wel plot the distribution in several different radial bins equally 
spaced in logr out to a radius of IAR&, where R& is the virial radius 
corresponding to an overdensity A = 200 and is at lla 0i jj. For each 
curve, we indicate the centre of the radial bin relative to the scale radius 
and also the number of particles in that bin u n , in units of 10 3 /kpc . In 
the bottom panel we show the global distribution (solid line), along with 
the Skewed-Gaussian fit (dashed line). Also shown are two vertical lines at 
approximately 75. — sa and TZ + a corresponding to the region that is used 
to determine the scatter of the background distribution, an . 



In order determine the form of 7?.-distribution for a smooth 
halo, we examine several smooth, spherical haloes generated by 
GALACTICS (Kuijken & Dubinski 1995; Widrow & Dubinski 2005; 
Widrow et al. 2008). GALACTICS generates self-consistent equi- 
librium models of spherical haloes with isotropic velocities and a 
density profile is given by 

Mr) =^-^(r/a )- a (l + r/o o )- (/, -° 



X C(r,r t ,Sr t ), 



(9) 



where a is the characteristic radius of the halo, V is the char- 
acteristic velocity dispersion, and C is a truncation function that 
smoothly goes from unity to zero at the truncation radius r t over 



a distance Sr t . For an NFW (Navarro et al. 1997) density profile, 
a — 1 and /3 = 3. 

In figure 2, we show both the local and global 7?.-distribution 
for a galactic mass NFW halo composed of 10 6 particles calculated 
iVcdi = 1953 (that is 512 volumes), N v = 32 (A/ so = 32A/ V = 
1024). The 7?.-distribution does not appear to exhibit a strong radial 
dependence and is both locally and globally well characterized by a 
Gaussian, but is slightly skewed. However, it is important to recall 
that we use a biased estimator of the local velocity density which 
can introduce a skew or asymmetry in the distribution, even for a 
smooth halo. 

For generality, we use a Skew-Gaussian to characterize the 1Z- 
distribution: 
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(10) 



where s is a measure of the skew or asymmetry, and Q(x) is the 
Heaviside function. We fit this function to the binned distribution 
using a nonlinear least squares Levenberg-Marquardt algorithm, as- 
suming the bins are independent and have Poisson errors. The fit 
shown in Fig. 2 has a reduced \ 2 ~ 1- 

Our primary goal at this point is to characterize, and ideally 
minimize, the dispersion in the background distribution, an- This 
parameter, along with the skew and mean, depends on both N v and 
iVccii. The optimal values for N v and iV C ell should minimize an 
and 1 1 — s | . In figure 3 we plot an and s as a function of N v and 
N cs \\. This figure shows that the distribution is quite sensitive to 
N v , but significantly less so to N ce u. At small N v , an oc N v , 
indicating that an, is dominated by the Poisson noise in the /; (v) 
estimator. The optimal N v appears to be 32-64. We find that this 
value appears to be independent of A^en, N V /N SB , and the number 
of particles in the halo so long as N sc < 1%Nh and A^c is a 
few times larger than A^. If one uses too large a value for A^,,, one 
will no longer be measuring the local velocity density function. The 
resulting 1Z distribution will then be highly skewed. We have found 
that the local velocity density function begins to produce a strong 
bias or skew for N sc < 256. This implies that the minimum number 
of particles a halo must contain for our method to work effectively 
with little or no bias is Ah ^ 10 4 . 

The weak dependence of an on N cc \\ indicates that an is 
dominated by the scatter in the fi (v) estimator. The fact that the 
dispersion increases as Af ce ii increases argues for using a small 
N cc \\. However, this conflicts with the need to ensure cells aver- 
age over any substructures present in the halo. In general, we do 
not know, a priori, what substructures are present in a halo. How- 
ever, we appeal to the fact that neither an nor s strongly depend on 
Acd! and that subhaloes typically have mass fractions of < 10~ 2 
to determine an initial optimal Ac C n. By setting Af cc n ~ 10 _2 A r H, 
that is decomposing the halo into 128-1024 cells, we ensure that 
any compact substructures composed of < 10 -2 Ah will not sig- 
nificantly affect /bg(v). 

As briefly mentioned before, we have found that in our test 
cases the 1Z distribution does develop a systematic radial depen- 
dence for very large cell sizes, Acd! > 1%Ah- This effect is shown 
in Fig. 4, where we plot the parameters characterizing the local 7£- 
distribution for the radial bins shown in the top panel of Fig. 2 
calculated using two different cell sizes, our fiducial value and a 
cell size four times as large (N cc n = 0.8%A?h). Both the mean 
and the variance do not significantly change even for radii near the 
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Figure 3. The variation in a-ji & s with N v & 7V ce n relative to fiducial 
parameters of N v = 32 & N ce n = 1953 (or 512 cells). Circles correspond 
to a-ji, squares to s, filled symbols to N v and open symbols to N CB u. 
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and s respectively. We also indicate the virial radius with a dashed vertical 
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virial radius nor do they display a strong radial trend. However, the 
skew does appear to develop a trend when using the larger cell size, 
increasing with increasing radius. It should be noted that though a 
systematic dependece is present, it is not strong even with this large 
a cell. This result indicates the effectiveness of our decomposition 
method. 

Our working hypothesis is that that changing the halo's bulk 
properties, such as is density profile, morphology or velocity 
anisotropy, will not change the form of the 7£-distribution. We 
have tested our hypothesis using smooth haloes with different den- 
sity profiles ranging from a cored isothermal profile to a steep 



r (1 + r/oo) -1 . These tests confirm the ^-distribution is in- 
sensitive to the density profile. We argue that changing other bulk 
properties such as the morphology will also leave the distribution 
unchanged. 

Identifiable substructures will introduce secondary features in 
the ^-distribution that are anticipated to lie outside the scatter in 
TZ inherent in the smooth background. To account for this scatter, 
we estimate TZ & an- These quantities are calculated using the 
full width half max region about the most probable 1Z- value as this 
dominant peak should correspond to the smooth background. Once 
we have determined the mean and the dispersion, we calculate 



C k = (1Zk - 1Z)/a-K. 



(11) 



So long as the ^-distribution is dominated by a singly peaked dis- 
tribution, C is effectively a normalized deviation. 



2.7.4 Identifiable substructures 

To address what properties that will make a substructure appear dy- 
namically distinct, let us consider a one dimensional Gaussian ve- 
locity that comprises of a substructure with mean v s and dispersion 
of embedded in a background with a^ g . For a particle belonging 
to the velocity substructure, and ignoring the scatter in 1Z for sim- 
plicity, 



7? : 



2<T? 



+ 



2 < 



ln(rJb g /cr s 



(12) 



where the first term corresponds to the local density, the second 
term to the estimated background density, and the third term is a 
ratio of the dispersions. 

A particle belonging to this substructure should appear more 
strongly clustered in velocity space than the background and thus 
have 1Z > 0, or equivalently C > 0. These particles will have 
velocities v = v s + aa s , where a follows a normal distribution. For 
the case where v s = a particle will appear dynamically distinct if 



0< 



o-a/obg < exp 



+ In (o"b g /o" s 



1- 






1 



(13) 



Therefore, to find a fraction erf(a/v2) of the particles belonging 
to the substructure, the dispersion must be exponentially smaller. 
In the case where the dispersions are the same, a similar fraction 
will be identified so long as the offset between the mean velocities 
is significant, 



aa\ > lata. 



(14) 



In the full three dimensions, the required difference in veloc- 
ity or dispersions in any given dimension effectively decreases by 
factor of ~ l/v3. 



2.2 Linking Outliers 

2.2.7 FOF-like algorithm 

Once the C distribution has been determined we apply a cut and 
only keep particles with C ^ £th- Since the C for a smooth halo 
follows an approximate Normal distribution, one could use £th > 
2 as this would eliminate > 97.5% of the background population in 
a smooth halo. By increasing £ t h, we can decrease the likelihood 
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of finding artificial groups but this comes at the expense of reducing 
the fraction of substructure that can be recovered. We will return to 
this issue in §3. 

The subset of particle determined through this cut is searched 
using a Friends-of-Friends-like algorithm where we link particles i 
and j iff 



(x« 



a 



<i, 



i/Vr «; vi/vj < v r 

cos op ^ — , 

ViVj 



(15a) 
(15b) 
(15c) 



where £ x is the physical linking length, V r is the velocity ratio, and 
cos O op is the velocity cosine. The first criterion is the standard 
FOF criterion, linking particles that are separated by a distance less 
than £ x . The last two criteria ensure that the particles have similar 
velocities. The reason the form of the velocity criteria differs from 
the configuration criterion, that is the reason we don't use (v* — 
Vj) 2 /^v < 1, is that tidal streams may have large velocities and 
dispersions. Consequently, scaling an allowed velocity dispersion, 
£y is non-trivial. In total, this FOF algorithm has 4 parameters, £th, 
£ x , V r and cos op . 

As with all FOF algorithms, it is possible to find spurious 
groups with a poor choice of linking parameters and determining 
the optimal FOF parameters is not trivial. To guide the choices of 
these parameters we can consider either probabilistic or physical 
arguments. Consider first V r , which is simply a speed ratio. V r ~ 1 
would be a conservative choice whereas V r 3> 1 will undoubt- 
edly result in many spurious links. The related velocity parame- 
ter cos op also has two limiting cases, the conservative value of 
cos O op ~ 1 and the relaxed condition of cos O op ~ — 1. 

The £ x linking-length parameter can significantly influence 
the results and, in the form used, there is no specific value to ap- 
peal to without prior knowledge of the halo structure. However, 
haloes of the virial overdensity Ap Dg are identified in cosmological 
simulations using £ x = lA" 1 ' 3 times the inter-particle spacing. 
Equivalently, given a halo containing Na particles in a radius Ra, 
^x,h = (2-k/Na) 1 Ra- A natural choice to search for subhaloes 
would be £ x < ^ x ,h- However, we are also interested in finding 
physically diffuse tidal streams, therefore a reasonable choice is 

<?x~^x.H. 



2.2.2 Removing artificial groups 

Regardless of the choice of parameters, it is always possible that 
random statistical fluctuations will give rise to artificial groups. In 
order to remove these artificial groups, we ensure that a group's £ 
is significant relative to Poisson noise using a variation of a signif- 
icance parameter as outlined in Aubert et al. (2004). Their ADAP- 
TAHOP algorithm identifies subhaloes by examining the physical 
density of particles and the groups thus found are required to have 
an average physical density that is significant relative to a threshold 
value when compared to Poisson noise. However, since our goal is 
to not only find physically overdense subhaloes but possibly physi- 
cally underdense but dynamically distinct substructures, we cannot 
use density or phase-density. Instead we use the group's average 
{£) value. If the group was purely artificial, one would expect (£} 
to be within Poisson noise of the expected £ calculated using the 
background distribution and the threshold £ t h imposed. Thus, we 
require that a group composed of N particles have an average (£} 



that satisfies 

(£) > £(£ th ) (l + v/^N 
Here r\ is the required significance level and 



(16) 



r — _H5 = *-^ (\T) 

l-erf(£ th /V2)' 



J e~ x / 2 dx 



We also require that a group must have N ^ A m ; n . If the group 
does not satisfy the first criterion, the group member with the 
smallest £ value is removed till the criterion is satisfied or until 
N < A m i n , in which case the group is removed. We generally set 
1) > 1 and A min = 20. 



3 TESTS WITH A SINGLE SUBHALO 

To test our STF code and optimize the algorithm's parameters, we 
generate self-consistent equilibrium models of generalized NFW 
haloes with isotropic velocities using GALACTICS. The simula- 
tions are run using the parallel N-body tree-PM code GADGET-2 
(Springel 2005). 

We first consider the case of a single subhalo that is stripped of 
particles by the tidal field of its (smooth) parent halo. For this test, 
we assume a host halo identical to the one described in the previous 
section, that is A#h = 10 12 Mq, a characteristic velocity of V ,h = 
344 km/s and a scale radius of a 0i H = 14.4 kpc. We embed a 
subhalo with a mass equal to 0.005 times the host halo mass. As 
with the host halo, the initial density profile of the subhalo is given 
by Eq. (9) with a — 1 and /3 = 3. The structural parameters for 
the subhalo are a 0i s = 2.6 kpc, V ,s — 90 km/s. The subhalo 
is placed on an circular orbit with radius of r apo ,s = 9a 0i H = 
0.8-Ra and an initial azimuthal period of ~ 4.6 Gyr, ignoring the 
effects of dynamical friction. As a result of its orbital radius and 
structural parameters, the subhalo's tidal radius is (a 3a ,s- The 
subhalo is simply embedded in the larger host halo as GALACTICS 
cannot account for the presence of an external gravitational field 
when generating a halo. As a consequence of the diffuse nature of 
the subhalo, embedding it in the host causes particles in the very 
outer regions with retrograde velocities relative to subhalo's orbital 
velocity to become unbound. 

The evolution of this subhalo is shown in Fig. 5 at three dif- 
ferent times: to, the start of the simulation; and t\ and £2 after two 
and four azimuthal orbits, respectively. After each azimuthal orbit, 
the subhalo loses about ~ 15% of its mass in the form of diffuse, 
leading and trailing tidal streams. The subhalo is an easily identi- 
fiable phase-space peak in the top row which appears to decrease 
with time as the subhalo loses mass. In contrast, the stream is not 
visible on this scale, though it should be noted that much of the par- 
ticles in the diffuse streams are denser than than their surroundings. 
This contrast between the tidal streams and the background is sig- 
nificantly enhanced once we look at their £ value. We will return 
to this particular point later in §3.5. 



3.1 Outlier Distribution 

We use 256 cells with A co n = 3926 (8-level tree) and N v = 32 
(N sc = 32A V = 1024) to calculate the distribution of £, the nor- 
malized deviations. In Fig. 6, we plot the initial £-distribution of 
the background and subhalo particles separately again at to, ti and 
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Figure 5. Projection of halo-subhalo system in the orbital plane for a box 2/?a across at three different times, t (left), t\ (middle), and ti (right). The top 
row is a scatter plot of all the particles in the system where the particles of originating from the subhalo plotted over top the particles in the background. 
Particles coloured according to their estimated density in phase-space, X = (x, v), calculated with KNBID using a logarithmic colour scale spanning 7 dex, 
going from low to high density as one goes from dark blue to green to dark red. The middle row shows the physical distribution of subhalo particles colour 
coded according to their C value. In this row, dark blue corresponds to C ^ 1 and dark red corresponds to C > 3. The bottom row shows the group found 
using our algorithm with particles colour coded according to their group along with associated purity p and recovery fraction r of the three largest groups (see 
discussion below for information on the purity and recovery fraction). Also shown are the center-of-mass of the halo, a circle at r = R& (solid), and another 
at r = a h (dashed). The three rectangles denoted by Ri, R2, and R3 outline the three cells shown in Fig. 6. 



£2. This top row of the figure shows that more than 80% of the sub- 
halo population has C > 2 at the times examined and is a small sec- 
ondary feature in the global C distribution. Initially the two dynam- 
ical populations are well separated and as expected this distinction 
erodes over time as the subhalo is disrupted. At all times £ > 4 
is dominated by the subhalo population. However, the separation 
between the two populations is greater than these results suggest 
since the subhalo particles are clustered in both configuration and 
velocity space. Halo particles with high-£ values are generally sta- 
tistical fluctuations and are thus unlikely to be clustered in either 
configuration or velocity space and therefore will not be linked. 



The middle row of Fig. 5 shows that the C values of the sub- 
halo particles depend on where these particles reside in the tidal 
streams. Particles in the more diffuse outer edges of the stream 
tend to have smaller C values than those in the more densely pop- 
ulated portions of the tidal stream. This is not a surprising result as 
it is reflective of particles becoming more dynamically similar to 
the background. But perhaps more importantly, a particle's C value 
does not appear to depend strongly on the particle's radial position 
in the host halo. This is most clearly seen at t\ where the densest 
portion of the leading stream, located at a ~ 5a ,H, and the trailing 
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Figure 6. The normalized /^-distribution. The top panel shows the distribution of the entire halo-subhalo system at three different times calculated with 
our fiducial values for halo particles (solid line) and subhalo particles (dashed lines). The bottom panel shows the distribution in three specific cells at ti 
containing different fractions of subhalo particles /g . The filled regions in the bottom panel show the distribution of particles belonging to groups identified by 
our algorithm The region to the right of vertical dashed line contains 80% of the subhalo population. We also show the purity and recovery fraction for each 
volume. 



stream, out past the virial radius, both have numerous particles with 
high C values. 



The bottom row of Fig. 6 shows the /^-distribution at ti for 
three cells located at different points shown in Fig. 5: Ri, located 
near in the progenitor subhalo and enclosing a substantial frac- 
tion of subhalo particles; R2, containing subhalo particles primarily 
from a region along the leading tidal stream; and R3 containing par- 
ticles from the trailing tidal stream. In Ri and R2, the /^-distribution 
appears to be composed of two populations even though R2 con- 
tains far fewer subhalo particles. In R3 the two dynamical pop- 
ulations are not well separated. Without knowing a priori which 
particles originated from the subhalo, secondary features in the C- 
distribution are not obvious. This poor separation may appear to 
indicate that C has a radial dependence since part of the reason 
for this poor separation is due to a larger number of background 
particles with high C values in R3 compared to Ri and R2. How- 
ever, most of the background particles with high C values originate 
from well outside the virial radius near the boundary of the sys- 
tem where the spatial sampling is very low. As a result of the poor 
spatial sampling, the local velocity density function estimator is no 
longer truly local and is strongly biased. In general however, dif- 
ferent populations may have overlapping C distributions and an C 
cut-based approach that does not consider particles configurations 
is unlikely to find streams successfully. Hence our FOF algorithm 
is a necessary step that utilizes the fact that the subhalo particles 
are in fact clustered in both configuration and velocity space. 



3.2 Linking parameters & groups identified 

We now turn to establishing the value of Cth- As noted earlier, 
based solely on the distribution observed for a smooth halo, a 
value of £th > 2 would eliminate ~ 97.5% of the background 
population from the search. The results shown in Fig. 6, suggests 
this is a reasonable cut-off, but we use an even more conserva- 
tive value of Cth — 2.8, to exclude almost all the halo parti- 
cles from the search. By looking for more significant outliers we 
can allow for larger velocity differences and physical distances. 



Our fiducial values are V r 



2 and cos0 



0.97. Due to the 



extended nature of the streams, we use a large linking length of 
£ x — 10Ra/N h ~ 5^ x ,h = 16 kpc, where the virial radius is 
defined using the overdensity of A = 200. 

To assess the accuracy of our algorithm we calculate the pu- 
rity and recovery fraction for the groups found (Sharma & John- 
ston 2009). First we construct a classification matrix A where each 
element Oj,j represents the number of particles that belong to an 
intrinsic group i that have been classified as belong to group j. We 
also note the cluster j for which Oj,j is a maximum and associate 
group j with the intrinsic group i. We construct a recovery classifi- 
cation matrix A' such that a' t j — cn t j if j is the recovered group for 
intrinsic group i and a[ j — otherwise. The purity of an identified 
cluster j is given by pj = ^2 i a'ij/Nj, where Nj is the total num- 
ber of particles associated with group j. The purity parameter P is 
simply the mean purity. The recovery fraction is riS^, a'ij/Ni, 
where Ni is the total number of particles associated with intrinsic 
group i. The recovery parameter is defined as the sum of every in- 
trinsic groups recovery fraction, R = ^\ n, and is anticipated to 
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decay with time as particles phase-mix with the background halo. 
We also calculate the mean recovery f. Together the quantities in- 
dicate how closely identified groups match intrinsic groups and the 
total number of recovered intrinsic groups found weighted by the 
fraction of that intrinsic group found. In this first example, there is 
only one intrinsic class. 

The groups found using these FOF parameters are shown in 
the bottom row of Fig. 5. At all three times, the central subhalo is 
identified. Once the subhalo has formed tidal streams, a fraction of 
the tidal streams are also identified. At all times, the purity of each 
group is high, > 0.9, and the total recovery fraction for the intrin- 
sic group is > 0.6. This fraction decreases with time due to two 
factors. Firstly, some portions of stream becoming sufficiently dif- 
fuse that the linking process does not identify them. Secondly, the 
natural phase-mixing of the subhalo particles with the background 
increases as time passes. Portions of the stream become very dif- 
fuse and though these particles are generally outliers, they are not 
linked with the £ x used. Furthermore, once a particle has been com- 
pletely unbound from the subhalo, it will begin to phase-mix with 
the background as time passes. Arguably, as the phase-mixing oc- 
curs the recovery fraction should be normalized by a reduction in 
the size of the intrinsic subhalo class and our values can be seen an 
underestimates of the true recovery. We also note that some of the 
most distant outliers in the tidal tails are not linked simply due to 
the choice of linking parameters. There is no trivial solution to this 
linking length issue which we discuss later in §3.4. 

At all times, only a very small number of halo particles are 
associated with the groups, ~ 100 at t\ and ti. In order to asses 
whether these particles are truly false positives we compare the 
tangential and radial velocities of the "misclassified" halo particles 
at t\ to the subhaloes particles in the same volume from ti to ti. 
During this period, most of the halo particles lie well within the 
envelop of subhalo particles' orbital velocities. Comparing the dis- 
tribution of velocities of the two populations using a Kolmogorov- 
Smirnov test. We find the time averaged probability that the halo 
particles belong to the same parent distribution of orbits as the sub- 
halo particles to be ~ 0.2 for both v r and v c over an entire orbital 
period between t\ and t^. Although the KS test is not very effec- 
tive at measuring how similar two samples are and this probability 
is not high, given the low numbers, this comparison suggests that 
not all these particles are false positives and may have been swept 
into the subhalo. Approximately ~ % of these background parti- 
cles lie within orbital space of correctly identified particles during 
this period. This in itself is an interesting result and worth further 
investigation, though a detailed analysis of how these background 
particles are swept up is beyond the scope of this work. 

Returning to Fig. 6, the recovery and purity fractions for the 
different cells provide useful information about the effectiveness of 
the linking process. Of most interest is the purity fraction in Ri and 
7?3, which are located in the diffuse leading and trailing streams 
respectively. The purity in these volumes is very high which is not 
what one would expect based solely on the observed ^-distribution, 
which might suggest a stronger inclusion of non-subhalo particles. 
Again, most of the subhalo particles above Cth are linked whereas 
a substantial fraction of the halo particles above Cth are not. These 
halo particles remain unlinked due to the fact their large C values 
are either due to statistical fluctuations or biased estimates for par- 
ticles lying well outside the virial radius at large physical distances 
from other particles, and, unlike the subhalo particles, are not clus- 
tered in phase-space. 
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Figure 7. The mean purity (filled points) and total recovery fraction (open 
points) as a function of C t h at three different times. 



3.3 Optimizing the choice of Cth 

Figure 6 also shows that there are subhalo particles below the 
threshold used. No choice of Cth is perfect and we must tolerate 
the inclusion of some background along with the loss of substruc- 
ture particles. Our fiducial choice of Cth was based on probabilistic 
arguments and investigation of the C distribution. A more rigorous 
determination of the optimal Cth requires searching for the value 
that maximizes both P and R. We show the dependence of these 
quantities on the threshold in Fig. 7 at three different times. Ini- 
tially, one can use a low threshold and still recover a large frac- 
tion of the subhalo with a high purity. Once the tidal streams have 
formed, the purity is relatively insensitive to the threshold used for 
Cth si 2.5 and converges to w 0.95 for C t h ^ 3. The differences 
in P between ti and £2 for Cth > 2.5 are small, especially con- 
sidering that as the system evolves, the subhalo will have swept up 
more of the background particles thereby resulting in P at £2 be- 
ing systematically underestimated relative to P atfi. The recovery 
fraction decreases as we increase Cth at all times and the sensitivity 
of R on Cth increases slightly with time as more of the subhalo's 
particles mix with the background. The overall decrease in R at a 
given Cth with time is due to the increasing amount of particles 
that are truly lost from substructure. 

Based on this model, Cth ~ 2 is clearly enough to ensure a 
large fraction of a subhalo is found with a very high purity at to, 
but that is not the case once tidal streams have formed. Since a true 
cosmological halo at any given instant will have both subhalos and 
streams, it is more constructive to look at P and R at £ Js £1 . At 
these late times there are diffuse streams that are physically well 
separated from their progenitor subhalo. The choice of Cth that 
maximizes P + R at these late times is ~ 2.8, our fiducial choice. 
This optimal threshold appears to hold true regardless of a stream's 
orbit, whether circular or radial. 



3.4 Merging candidate groups to increase the recovery 
fraction 

While recovery fractions are generally high, streams are never per- 
fectly recovered. The unlinked particles lie in the most diffuse re- 
gions of the tidal stream, and although many of these particles are 
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outliers, they are separated by distances larger than the linking- 
length l^. The presence of diffuse regions in the stream has also 
lead to the algorithm splitting the tidal stream into several groups 
at ti and £2 • While this splitting might seem to be cause for concern, 
we again emphasize it is primarily due to the very diffuse nature of 
certain portions of the tidal stream. For example the largest group 
found at £2, which is composed of 151 particles, has r — 0.027, 
spans approximately ~ 200 kpc and accounts for ~ 7% of the 
mass in the volume occupied by this group. The closest particles 
belonging to the neighbouring group, which contains the subhalo, 
lie just outside the velocity and spatial criteria used. 

The recovery fraction can be increased by lowering thresholds 
and increasing linking lengths, but in turn this will reduce the pu- 
rity. An adequate balance between the two must be chosen. Since 
tidal streams can be very diffuse and pass through one another there 
is no simple adaptive way of choosing a spatial linking length. A 
more robust iterative approach would use conservative values for 
the FOF parameters, find an initial set of candidate groups, relax 
the FOF criteria, generate a new subset and then search this new set 
for particles that meet our FOF criteria with previously identified 
candidate particles. 

We have found that this method works quite well, effectively 
merging several of the groups together, increasing the recovery 
fraction of the particles in the tidal streams by a factor of 2 while 
keep the purity > 0.90. The net recovery increases from 0.68 to 
0.75 and 0.65 to 0.75 at £1 and £2, respectively, keeping in mind 
that a majority of the particles still lie in the subhalo. However, we 
caution that this model is a best case scenario for this approach as 
there is only one substructure. Realistic haloes will contain numer- 
ous substructures which may overlap in configuration and velocity 
space and using the iterative method may merge substructures to- 
gether. 

3.5 The C and phase-space density contrast of tidal streams 

Numerous studies show that the contrast between the outer region 
of a subhalo relative to the background is much greater in phase- 
space than in physical space (Sharma & Steinmetz 2006; Elahi et al. 
2009; Helmi & de Zeeuw 2000; Sharma & Johnston 2009; Ma- 
ciejewski et al. 2009). Algorithms such as ENLINK and HSF use 
this increased contrast to identify these substructures. However, ac- 
curately computing the local phase-space density is a non-trivial 
task. In the following discussion, we show that the difficulty of ac- 
curately computing the phase-space density means that these algo- 
rithms might not find the extended tidal streams even though these 
structures are dynamically distinct. 

In figure 8, we compare the phase-space structure of the sub- 
halo particles relative to the background. To compare the proper- 
ties of correctly identified group members with the background, we 
first find the nearest JVnn neighbours in phase-space for each group 
member. Here we set A r NN = 32. The key to correctly identifying 
near neighbours in phase-space lies in determining the optimal lo- 
cally adaptive metric describing the local phase-space volume. We 
follow the method outlined in ENBID to find these nearest neigh- 
bours. 

Next, for each correctly identified group member we deter- 
mine /^ N , the fraction of nearest phase-space neighbours of a 
group member that are also members of the group as shown the 
left panel of Fig. 8. Prior to the formation of tidal tails, most group 
members' nearest phase-space neighbours are also members of the 
group. The initially small fraction of particles with /^™ ~ 0.5, 
~ 0.03, reside in the outer edges of the subhalo. As the subhalo be- 



gins to lose mass and form tidal tails, this fraction grows to sa 0.20 
and w 0.25 at £1 and £2 respectively. The anisotropic velocity struc- 
ture and diffuse configuration of tidal streams makes accurately 
determine nearby neighbours nontrivial and consequently linking 
nearest neighbours is not the ideal method for identifying these 
structures with a high purity. 

We compare the estimated phase-space density of particles lo- 
cated in the tidal tails at £ > £ , which have /^* N ^ 0.5, to their 
nearest phase-space neighbours in the background population by 
calculating the average logarithmic phase-space density contrast, 
(log[J 1 (X)//(X)N g N ] ) . The distribution of (log[F(X)//(X)^ g N ]} 
at £1 and £2 is shown in the middle panel of Fig. 8. Although most 
of the particles in the tidal streams are denser than their surround- 
ings, ~ 20% are less dense. If we account for the local scatter 
by calculating the standard deviation SD(log[F(X)//(X)^ g N ]}, 
the fraction of particles that are not significantly denser than 
their surroundings, that is particles with (log[_F(X)//(X)b g v ]} — 
SD(log[F(X)//(X)£ g N ]) < 0, increases to w 25%. 

For this low density subset, we determine {\C/C^\}, the av- 
erage ratio of the particle's C value relative the nearest phase-space 
neighbours belonging to the background. The right panel of Fig. 8 
shows that these particles typically have much larger values despite 
appearing underdense. It is perhaps significant that these quantities 
do not appear strongly correlated for the particles residing in the 
diffuse tidal streams. 

These results indicate that particles in tidal streams may have 
negligible contrast in their estimated phase-space density relative to 
their surroundings due to their diffuse configuration and the inher- 
ent noise in the phase-space density estimator. Consequently, any 
search that used only phase-space density may miss identifying par- 
ticles residing in the very diffuse portions of the stream while en- 
suring a high purity whereas our algorithm will be able to identify 
them. 



3.6 Other single subhalo tests 

We have examined the impact of placing the subhalo on different 
orbits. In all cases, regardless of the subhalo's orbit, substructure 
was recovered with purity and recovery fractions of > 0.8 and 
> 0.6. For example, we analyzed a subhalo that was on a primarily 
radial orbit plunging through the center of the host with a peri- and 
apocenter of r per i,s = O.Ioh and r ap0i s = 7.5<xh = 0.67-Ra, re- 
spectively. Over the period analyzed, corresponded to several radial 
orbits during which the subhalo lost ~ 60% of its bound mass, the 
subhalo's £-distribution is similar to that of the subhalo on an ellip- 
tical orbit. That is the substructure distribution is a small secondary 
feature in the global C distribution and dominates the distribution 
for C > 4, though it is more strongly peaked than the distribu- 
tion in th elliptical orbit. Over the period examined, our algorithm 
identified a single group containing both the progenitor subhalo and 
the tidal streams and shells produced by the subhalo using the same 
fiducial threshold. The purity of the group was always > 0.95 while 
the recovery fraction dropped to r « 0.8 after starting at r > 0.9. 

It is interesting to note that in this example the false positives 
identified have orbits that do not strongly differ from the orbits of 
the particle in the substructure. Approximately 70% of these false 
positives remain in the orbital space of the subhalo over one ra- 
dial period. Again, it appears that some of the background has been 
swept up by the subhalo. Though it is beyond the scope of this 
work, it would be interesting to examine in detail this "mass accre- 
tion" and its dependence on the subhalo's orbit and position. 

These tests taken together show that a particle's C value does 
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Figure 8. Comparison of correctly identified group members with nearest neighbours in phase-space at three different times as seen in Fig. 5. Left: The normal- 
ized distribution of /? for all correctly identified group members along with a vertical line at /?" 

phase-space density contrast with respect to the background, (log[F(X)//(X)^ N ]}, of correctly identified group members with / c 
is a solid vertical line at (log[i ;, (X)//(X)^?']} = and a typical error bar for (log[F(X)//(X)J^ > ']}. In these panels, all the distributions are normalized 

of 



0.5. Middle: The normalized distribution of the average 
9 NN ^ 0.5. Also shown 



individually. Right: (log[.F(X)//(X)j^ ]} versus the average contrast in the normalized standard deviation with respect to the background, (£/£{,) 
particles with /N N < 0.5 and <log[F(X)//(X)NN]) _ SD(log[F(X)//(X)N g N ]} <g . Again we show a typical error bar for (log[F(X)//(X)N g N ]} 



not strongly depend on the particles position in the host halo itself 
or its orbit, but rather on whether the particle belongs to a veloc- 
ity substructure. Hence, our algorithm is ideally suited to finding 
substructure since C is an excellent discriminator. 



cally cold subhaloes. Particles with these high phase-space densi- 
ties account for 30% of the halo's mass initially and typically have 
(log[F(X)/7(X)£ g N ]> fa 3.5. After 5 Gyr, only fa 4% of the par- 
ticles have similarly high phase-space densities. 



4 TOWARDS A REALISTIC MODEL 

4.1 Constructing the test case and general evolution 

We next consider a halo with a large number of subhaloes. We have 
constructed a halo, as outlined below, where we know, a priori, 
where all the particles come from. The tests here serve as an inter- 
mediate step towards full cosmological haloes. 

To construct the test system, subhaloes were given cored 
isothermal density profiles with mass fractions from / = [5 x 
10 ,5 x 10 ] and were drawn from a mass distribution such 
that the number of subhaloes per logarithmic mass fraction is 
dn/d In / = Af~ 0,9 , where power-law index is the one commonly 
observed in cosmological simulations (e.g. Moore et al. 1999; Die- 
mand et al. 2008; Springel et al. 2008; Elahi et al. 2009). These 
studies also show that the typical amount of the host's mass in 
bound subhaloes in mass fraction range of / ~ 10~ 6 — 10~ 2 is 
~ 10%. To follow a large number of haloes at our chosen mass 
resolution we set the mass fraction to a much higher value of 30% 
of the host's mass yielding 1496 subhaloes in total. We placed sub- 
haloes by replacing particles of the host halo with radii from r — 
[a ,n, Ra] with subhaloes and set the subhalo's center-of-mass and 
center-of-mass velocity to the replaced particle's position and ve- 
locity. The resulting system has a total mass of 1.6 x 10 12 Mq 
and is composed of 1.42 x 10 6 particles. This system is evolved 
for 10 Gyr, corresponding to ~5 dynamical times for the halo at 
the virial radius. 

The phase-space structure of the halo is shown in Fig. 9 at 
three different times, a few Myr after the start of the simulation, 

5 Gyr later, and after 10 Gyr. We also highlight the evolution of 
three subhaloes in the bottom two panels of Fig. 9. Initially, there 
are numerous regions in halo's phase-space that contain particles 
with high phase-space density, corresponding to dense dynami- 



4.2 Evolution of subhaloes 

The smallest subhalo shown, (circles), is composed of fa 100 
particles and is initially on an approximately circular orbit with 
an orbital period of 3.7 Gyr. After 5 Gyr the particles of this 
subhalo occupy a large volume in phase-space and no longer 
have a comparatively high phase-space density relative to the 
background. The mean logarithm physical and phase overdensity 
are <log[p(x)Mx)™]) = .01 and (log[F(X)//(X)™]) = 
0.35 respectively, where the physical overdensity is computed 
relative to nearby physical neighbours. After 10 Gyr, the mean 
(log[p(x)/p(x)kgf ]} fa 0. Despite this low contrast and the fact 
that only fa 35% of the subhalo particles are denser than their sur- 
roundings by a factor of one standard deviation, fa 40% of the 
subhalo's particles have C > 3. This comparison indicates that we 
are capable of picking out 14% more of this substructure based 
on C compared to /(X) at a greater normalized contrasted, which 
should result in a higher purity. 

The second subhalo shown, (triangles), is composed of 600 
particles and is on a primarily radial orbit. The subhalo's core is still 
relatively intact till after 10 Gyr and it has generated extended tidal 
streams over this period. At t\, the fraction of the particles in this 
group with (log[F(X)//(X)£ g N ]) - SD(log[F(X)//(X)™]) > 
is fa 0.7 and the mean (log[F(X)//(X)£ g N ]} for the group is fa 
1.4. By t2 the fraction of overdense particles is fa 0.5 and the mean 
(log[F(X)//(X)^ g N ]} of the subhalo has dropped to fa 0.4. At all 
times, the particles in the tidal streams also have high £ values, 
even in cases where the particle has a low phase-space density. 

The most massive subhalo shown, (squares), is composed of 
fa 2000 particles and is on an elliptical orbit. Although this sub- 
halo appears to be completely disrupted after 5 Gyr or fa 2.5 
orbits and has a mean (log[F(X)//(X)^ g N ]) fa 0, fa 50% of 
the subhalo's particles reside in a specific region in the r — v r 
plane and still have comparatively higher phase-space densities 
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Figure 9. Phase-space r — v r plot of halo at three different times, t , ti, ti (top) and the evolution of three subhaloes (bottom two rows). Particles in the first 
two rows are colour coded according to their phase-space density estimated with E.NBID using the same logarithmic colour scale as the top row of Fig. 5. In 
the top row particles belonging to substructure are plotted over top of particles in the background. Particles in the bottom row are colour coded according to 
their C value using the same colour scale as the middle row of Fig. 5. Also shown are mass contours. 



than the background. After 10 Gyr, the fraction of particles with 
(log[J 1 (X)//(X)NN])_SD(log[f (X)//(X)™]} > is « 0.26 
though the fraction of these particles with C > 3 is fa 0.30. 



Figure 9 highlights the fact that disrupted subhaloes can leave 
distinct high phase-space density features in the r — v r plane, 
though the phase-space density contrast of these features decreases 
with time. More importantly, this figure shows that even when the 
estimated contrast in phase-space may be negligible after several 
orbits, the particles originating from dynamically distinct substruc- 
tures still appear to be velocity outliers. 



4.3 Substructures Identified 

To calculate the normalized deviations we again use N v = 32 
and 256 cells with JV cen = 5530. We use C th = 2.8, V r = 2, 
cos op = 0.97, and l^ fa 1.5^ x .h = 5 kpc as our fiducial values. 
We also test our iterative method where initial candidate groups are 
found using the fiducial values and A^m = 10, then the system 
is searched again with C reduced by 10%, V r and O op increased 
by 10%, and £ x is increased by 50% and N min = 20. We will de- 
note these two searches as STF-1.5 and STF-IT. For comparison, we 
use a 6DFOF algorithm to find regions of high phase-space density 
with^ x = ^ x ,h and£ v = 0.25K.A, where V C 2 A = GM(R a )/Ra 
is the circular velocity at the virial radius. The 6DFOF parameters 
are chosen by examining the phase-space contrast of several ran- 
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Figure 10. Distributions of groups identified at three different times for three different searches, STF-1.5 (solid), STF-IT (dashed), and 6DFOF (dotted): 
substructure mass function dn/d log / (left); radial distribution dn/d log r (middle); and radial mass distribution dm/d log r (right). The radial position of 
a group is given by r = \x cm j — X cm h|- Also shown in the left panel is a dashed-dotted line indicating the mass distribution used to generate the initial 
conditions. We also show the total number of groups found and total mass fraction found in substructures using each algorithm. 



domly chosen subhaloes and ensuring that the 6DFOF search iden- 
tifies > 95% of the subhaloes at t . We also set the minimum num- 
ber of particles for the 6DFOF search to 50, otherwise the 6DFOF 
algorithm at to finds spurious groups. 

In Fig. 10, we show mass and radial distributions of the groups 
identified by each search. The mass distribution power law at to 
is reproduced by all three algorithms, but as the system evolves 
tidal disruption reduces the number of substructures identified. Ac- 
cordingly, the total fraction of mass in substructure, / to t, also de- 
creases with time. Initially, 6DFOF finds roughly the same number 
of groups as STF, but as subhaloes are tidally disrupted it finds pro- 
gressively fewer groups compared to STF. At £2, after subtracting 
out the contribution from the halo core that is falsely identified by 
6DFOF the mass fraction in substructure it identifies is 20 — 50% 
less than that found with the STF searches. It also finds signficantly 
fewer structures than the STF searches. 

The radial distribution indicates that the number of velocity 



substructures within a few scale radii steadily decreases with time. 
This is not unexpected since the subhaloes that cross the very cen- 
tral region of the host experience the strongest tidal and impulsive 
forces and are more likely to be disrupted. The apparent dearth of 
substructure is in qualitative agreement with other studies (Springel 
et al. 2008; Diemand et al. 2008). However, the radial distribution 
for the STF groups suggests that the central regions is not com- 
pletely devoid of substructure. 

The overall performance of each search is summarized in Ta- 
ble 1 . The random placement of the subhaloes means that two sub- 
haloes may overlap in both configuration and phase-space. Conse- 
quently, we define the intrinsic class of a particle using the initial 
results from each group finder to determine the recovery fraction. 
All find > 1450 substructures. The small initial differences be- 
tween the searches are due to a few low mass subhaloes lying in 
close proximity to another subhalo in phase-space and being linked 
to it. 
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Table 1. Performance of STF and 6DFOF 



Time JV g 


P±a P max 

-^ mm 


D jr. _L „ max 

u, r, ± <j r min 


t 1476 






stf-1.5 h 968 


0.96 ± 0.10 \-°l 


267.5, 0.18 ±0.27 J 00 


t 2 655 


0.93 ± 0.12 J;g° 


77.2, 0.05 ±0.14 |j" 


t 1481 






STF-IT h 1018 


0.96 ±0.10 J;?? 


306.6, 0.21 ± 0.28 J 00 


t 2 675 


0.93 ± 0.14 J;^ 


108.9, 0.07 ±0.17 §•" 



t„ 1457 

6DFOF tl 607 

ii 195 



0.92 ±0.15 1;™ 



0.86 ±0.21 



1.00 

11.20 



171.4, 0.12 ±0.23 J 00 
38.4, 0.03 ±0.12 g" 



The mean purity of the groups identified by STF is > 0.90, 
which is slightly higher than 6DFOF. The dispersion in the purity is 
also smaller for STF and shows little evolution in time while 6DFOF 
shows a significant increase in dispersion at the final time. This in- 
crease can be attributed to the increasing number of streams at later 
times and the inability of 6DFOF to correctly identify them. The 
minimum purity found in the STF searches is a product of streams 
overlapping rather than incorrect identification of background par- 
ticles. Since the purity of a group is calculated by determining the 
dominant intrinsic class of a group, if several subhaloes are on sim- 
ilar orbits and merge and produce tidal tails, the calculated purity 
will low. For example, the group with the minimum purity iden- 
tified by STF-IT at t% is a massive substructure resulting from the 
merger of several subhaloes and several overlapping tidal streams. 
Only 3% of the particles belonging to this group were initially part 
of the halo background at to- 

STF recovers 0.45 of the smallest subhalo highlighted in Fig. 9 
at t\ even though it appears to be disrupted, although by £2 none of 
the subhalo's particles are recovered. A similar fraction of the mas- 
sive subhalo in this figure is recovered at £1, After 10 Gyr only 
~ 0.17 is recovered, despite the fact that the subhalo has been 
tidally disrupted and occupies a large physical volume while only 
accounting for ~ 5% of the mass in this volume. The highlighted 
subhalo with the extended radial tidal tails is identified at t\ and £2 
with recovery fractions of 0.8 and 0.6, respectively. 

The difference between STF-1.5 and STF-IT suggests that the 
iterative method is a significant improvement over the standard 
method. The recovery fraction increases by 14% and 40% at £1 and 
£2, respectively, while P is not significantly decreased. However, 
there is a significant decrease in the minimum purity of STF-IT at 
£1. This decrease is due to one instance where several groups have 
been merged together, including the most massive group found 
by STF-1.5. The increase in R and decrease in P using the itera- 
tive method depends on how significantly the FOF parameters are 
changed after the initial search. To determine an optimal fractional 
change in the FOF parameters, we have used the STF-1.5 search as 
a baseline and varied the change which increases R while keeping 
P unchanged. We find larger changes than those used here degrade 
the mean purity and increase the instances of high mass, low pu- 
rity groups that are the result of several substructures being linked 
together. 

In summary, the recovery fraction of the STF search is greater 
than that of the 6DFOF search by > 2. Furthermore, the mean re- 
covered fraction for each group is also significantly higher in the 
STF searches. Increasing £ x used by STF increases the recovery 
fraction, particularly of low mass tidally disrupted groups located at 
large radii without significantly reducing the purity. For all three al- 
gorithms the recovery fraction is < 1/4 times the initial number of 
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Figure 11. The distribution of groups found by STF-3.0 in the log(Vj) ± 
(log[p(x)/p(x)^ 1 ?']}-t;i so plane at two different times. We plot contours 
corresponding enclosing 0.75, 0.5, 0.25 and 0.1 of the population along 
with outliers (open points). We also show a sample of groups that corre- 
spond to bound subhaloes (large filled points). 



groups found, due to subhaloes being disrupted and phase-mixing 
with the background. 

4.4 Subhalo or Stream? 

Substructures can range from relatively intact subhaloes, to tidal 
streams originating from completely disrupted subhaloes as well 
as combinations of the two. A substructure identified by STF can 
be any one of these substructures. However, by examining the bulk 
properties of a substructure it should be possible to identify its spe- 
cific type. For instance, streams are likely to have anisotropic veloc- 
ity distributions, whereas intact subhaloes likely have isotropic dis- 
tributions. Subhaloes are physical overdensities, whereas streams 
are diffuse and correspond to a small fraction of the mass in the 
volume occupied by the stream. We use three parameters to clas- 
sify a group; the overdensity, compactness and velocity isotropy. 
The overdensity of a group is determined by the the average 
physical density contrast between the group and the background, 
(log[p(x)/p(x)bg ]), which is calculated in a similar fashion as 
(log[F(X)//(X)™]) limited to configuration space. The com- 
pactness is given by the volume mass fraction, V/, that is the frac- 
tion of mass belonging to the group in the volume occupied by the 
group. The velocity isotropy, «i so , is given by 



«iso = 



0"2 + 03 

2cti ' 



(18) 



where at are the ordered eigenvalues of the group's velocity dis- 
persion tensor afj. If the dispersion is isotropic V[ BO ~ 1. 

We plot the groups identified by STF-3.0 in the log(Vf) + 
(log[p(x)/p(x)^ , ])-Wiso plane in Fig. 11. For clarity, we plot con- 
tours of the groups found at £0 and £1 and outliers of these con- 
tours. We also check to see if a group is self-bound and plot a 
sample of bound groups for clarity since they are clustered in 
one region of this plane. Initially, most groups are compact over- 
densities, with log(V/) + (log[p(x)/p(x)b g N l) > 0, and have 
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isotropic velocities. Most of these groups are self-bound subhaloes, 
accounting for « 25% of the halo's mass. The few groups with 
very anisotropic velocities dispersions which are neither overdense 
nor compact are unbound substructures. By ti, dynamical pro- 
cesses have disrupted these subhaloes, causing some to form tidal 
tails and others to be completely disrupted. A substantial frac- 
tion of the groups, ~ 70%, at this time have Vi so < 1CP 1 and 
log(V/) + (log[p(x)/p(x)£ g N ]) < -0.5. Only » 4% of the groups 
identified at t\ are subhalo like, that is groups with log(V/) + 
{log[p(x) / p(x)^]} > and «i so ~ 1. The bound subhaloes 
identified at ti, (large filled square points), account for « 2% of 
the halo's mass. The few groups identified with anisotropic veloc- 
ities but log(V/) + (log[p(x)/p(x)b g N ]) > are extended tidal 
tails with the core of the subhalo embedded inside. The remainder 
are unbound tidal streams. By t%, the fraction of groups we would 
classify as subhaloes has decreased to w 2% while the fraction of 
tidal stream groups has remained relatively constant. 



4.5 Inner Region 

As previously mentioned, Fig. 10 suggests that there are numerous 
substructures located in the very inner region of the halo at late 
times. However, due to the complicated morphology and extended 
nature of tidal streams, the center-of-mass need not even be located 
in the substructure itself. As a result this radial distribution may 
be misleading. In order to verify if we can find substructures in the 
very central regions of a halo, we plot the positions and velocities of 
all particles belonging to an STF group at ti in Fig. 12. This figure 
clearly shows that we do indeed find numerous streams crossing 
this region. 

These velocity substructures account for ~ 1% of the mass 
in this central region. Some of these particles belong to large tidal 
streams that were produced by massive subhaloes which lie out- 
side this region. Other particles belong to small substructures that 
are located in this region. An example of such a substructure is the 
spheroidal velocity substructure located at the center. The group 
has a high purity, p ~ 0.6, and primarily originates from a single 
subhalo at to. Most of particles identified have high relative veloc- 
ities, lying outside the background's 68% contour. Roughly 70% 
of the particles have velocities that are > 2<r away from the mean 
velocity. Notably, not all the groups identified have high velocities. 
The spherical substructure located near the center is actually within 
the velocity distribution of the background but is dynamically much 
colder. 

This figure indicates that that the central regions of haloes are 
not devoid of substructure even if the phase-space structure appears 
relatively smooth. It also clearly demonstrates that STF is capable 
of identifying these structures even in a low resolution N-body re- 
alizations of haloes containing ~ 10 6 particles. 



5 DISCUSSION AND CONCLUSION 

In this paper, we present an algorithm designed to identify both 
tidal streams and subhaloes. We begin by dividing the halo into 
cells. In each cell, we model the velocity distribution function 
of the background as a multivariate Gaussian (the "Maxwellian 
sea"). While there is no exact method for identifying and modelling 
the background population, we have demonstrated that decompos- 
ing the halo into sub-volumes (cells) containing a relatively large 
number of particles is sufficient to average out any substructures 
present. We then calculate an effective velocity-space density for 



each particle in the cell. The ratio of the local velocity density to the 
density of the background at the particle's velocity provides a prob- 
abilistic measure of whether the particle belongs to the background 
or to a stream or subhalo. This key step of identifying outliers is 
then combined with a FOF-like search to identify substructures. 

We have clearly shown using several test cases that our method 
is capable of identifying tidal streams and any subhaloes present in 
a halo. The purity of the identified substructures is very high re- 
gardless of the type of substructure, whether subhalo or diffuse tidal 
stream. Our method is even capable of identifying streams with a 
high purity in cases where the particles of the stream do not appear 
significantly overdense in either physical or phase-space for a given 
phase-space estimator, and account for a small fraction of the mass 
in the spatial volume occupied by the stream. Furthermore, we are 
able to effectively identify substructures in the cores of haloes, even 
haloes with low resolution. 

We also note that our key step of identifying velocity outliers 
can also be applied to any quantity for which one can character- 
ize the mean distribution. The increased contrast between parti- 
cles belonging to substructures and those belonging to the back- 
ground makes searching the resulting subset simpler. Furthermore, 
one could combine this key step with a variety of searches tailored 
to identify specific substructures. 

Our algorithm is not the only one capable of finding tidal 
streams. Both HSF (Maciejewski et al. 2009) and ENLINK (Sharma 
& Johnston 2009) should be able to find tidal streams, however, 
both these algorithms rely on accurately estimating the phase-space 
density for each particle. In general, determining the velocity den- 
sity is less computationally intensive than accurately computing the 
phase-space density. Searching the phase-space of a halo requires 
methods that calculate the local six dimensional metric for each 
particle to accurately determine distances in phase-space. We have 
also shown that, in certain cases, identifying substructures solely 
based on the phase-space density may miss diffuse tidal streams. 

The presence of tidal streams in the central regions of haloes 
has potentially important ramifications for indirect and direct dark 
matter searches. For instance, most studies of direct dark matter de- 
tection assume the velocity distribution of dark matter in the solar 
neighbourhood of a galactic mass halo is Maxwellian in order to 
determine the detection rate and recoil spectrum. Simulations show 
that the velocity distribution is more complex, possibly due to the 
presence of substructure. A few studies have attempted to go be- 
yond this simple assumption but are limited by the resolution of 
the studies used to determine the velocity distribution (Fairbairn & 
Schwetz 2009; Kuhlen et al. 2010). Since our algorithm allow us 
to accurately characterize the stream distribution function with a 
single snapshot from a cosmological simulation, it provides a eas- 
ily accessible avenue for going beyond the resolution of current 
simulations. By searching numerous haloes from numerous cos- 
mological simulations and stacking results, we will be able to de- 
termine the distribution function of these streams and extrapolate it 
to improve the accuracy of detection rates and the observed energy 
spectrum. 
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